using protein interaction database and support vector machine to improve gene signatures for prediction of breast cancer recurrence

نویسندگان

mohammadreza sehhati

alireza mehri dehnavi

hossein rabbani

shaghayegh haghjoo javanmard

چکیده

background: numerous studies used microarray gene expression data to extract metastasis-driving gene signatures for the prediction of breast cancer relapse. however, the accuracy and generality of the previously introduced biomarkers are not acceptable for reliable usage in independent datasets. this inadequacy is attributed to ignoring gene interactions by simple feature selection methods, due to their computational burden. materials and methods: in this study, an integrated approach with low computational cost was proposed for identifying a more predictive gene signature, for prediction of breast cancer recurrence. first, a small set of genes was primarily selected as signature by an appropriate filter feature selection (ffs) method. then, a binary sub-class of protein–protein interaction (ppi) network was used to expand the primary set by adding adjacent proteins of each gene signature from the ppi-network. subsequently, the support vector machine-based recursive feature elimination (svmrfe) method was applied to the expression level of all the genes in the expanded set. finally, the genes with the highest score by svmrfe were selected as the new biomarkers. results: accuracy of the final selected biomarkers was evaluated to classify four datasets on breast cancer patients, including 800 cases, into two cohorts of poor and good prognosis. the results of the five-fold cross validation test, using the support vector machine as a classifier, showed more than 13% improvement in the average accuracy, after modifying the primary selected signatures. moreover, the method used in this study showed a lower computational cost compared to the other ppi-based methods. conclusions: the proposed method demonstrated more robust and accurate biomarkers using the ppi network, at a low computational cost. this approach could be used as a supplementary procedure in microarray studies after applying various gene selection methods.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine

PURPOSE The prediction of breast cancer recurrence is a crucial factor for successful treatment and follow-up planning. The principal objective of this study was to construct a novel prognostic model based on support vector machine (SVM) for the prediction of breast cancer recurrence within 5 years after breast cancer surgery in the Korean population, and to compare the predictive performance o...

متن کامل

Sequence-based protein-protein interaction prediction via support vector machine

This paper develops sequence-based methods for identifying novel protein-protein interactions (PPIs) by means of support vector machines (SVMs). The authors encode proteins ont only in the gene level but also in the amino acid level, and design a procedure to select negative training set for dealing with the training dataset imbalance problem, i.e., the number of interacting protein pairs is sc...

متن کامل

Simulation and prediction of scour whole dimensions downstream of siphon overflow using support vector machine and Gene expression programming algorithms

Background and Objectives: The purpose of this study is to simulate and predict the dimensions of the scour cavity downstream of the siphon overflow using the SVM model and compare it with other numerical methods. The use of the SVM algorithm as a meta-heuristic system in simulating complex processes in which the dependent variable is a function of several independent variables has been widely ...

متن کامل

A Domain-Based Frequency Count Approach for Protein-Protein Interaction Prediction using Support Vector Machine

Proteins are involved in many essential processes within cell. Uncovering the diverse function of proteins and their interactions within the cell may improve our understanding of protein functions. Several high-throughput techniques employed to decipher PPI are erroneous and are limited by the lack of coverage. Computational techniques are therefore sought to predict genome-wide PPI. In this pa...

متن کامل

Using Wavelet Support Vector Machine for Fault Diagnosis of Gearboxes

Identifying fault categories, especially for compound faults, is a challenging task in mechanical fault diagnosis. For this task, this paper proposes a novel intelligent method based on wavelet packet transform (WPT) and multiple classifier fusion. An unexpected damage on the gearbox may break the whole transmission line down. It is therefore crucial for engineers and researchers to monitor the...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید


عنوان ژورنال:
journal of medical signals and sensors

جلد ۳، شماره ۲، صفحات ۰-۰

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023